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FURTHER  EXAMINATION  OF  SIMULATED  PERCENTAGE  POINTS 


By 

M.A.  Stephens 
Simon  Fraser  University 


1.  Introduction. 

In  a  previous  report,  Juritz,  Juritz  and  Stephens  (1981)  discussed 
methods  of  obtaining  percentage  points  of  distributions  by  Monte  Carlo 
studies,  either  by  using  one  long  run,  or  using  the  average  of  several 
smaller  runs.  The  single  run  has  certain  advantages — in  particular,  less 
bias.  The  two  methods  were  illustrated  using  Monte  Carlo  studies.  In 
this  follow  up,  the  comparison  is  continued,  using  as  illustrations  the 
normal  and  exponential  distributions,  for  which  exact  values  of  order 
statistics  can  easily  be  found.  An  amalgamation  of  the  earlier  report 
and  this  one  is  now  published  in  Juritz,  Juritz  and  Stephens  (1983). 

2.  The  Single-Sample  Method. 

We  first  summarize  previous  results.  Let  f(x)  be  the  probability 

density  function  of  a  continuous  random  variable  X,  let  0  <  p  <  1, 

and  let  £  denote  the  (100p)th  percentile  of  X.  Let  the  order  statis- 
P 

tics  of  a  random  sample  of  size  n  drawn  from  f(x)  be 

X(l)  <  X(2)  <  **•  <  X(n)'  Suppose  k  *  [np]  +  1,  where  [m]  denotes 

the  largest  integer  less  than  or  equal  to  m;  X^j  is  the  usual  estimator 

of  £  ,  from  the  run  of  n  Monte  Carlo  values  of  X.  Furthermore,  for 
P 

X  continuous,  the  random  interval  ^x(r)»X(8)^  wi*h  r  <  s  covers 
with  probability  ir(r,s,n,p)  given  by 


1 


s-l  . 

Prob(X^rj£Cp<X(gj)  -  ir(r ,s,n,p)  -  £  (“)pl(l-p) 


where 


p  ■  Prob(X£Cp)  • 

1/2 

Let  w  *  (np(l-p)}  ,  and  let  denote  the  (lOOy)th  percentile 

of  the  standard  normal  distribution.  Then  with 


(1) 


-wz  +  np  + 


and 


where  y  *  l-a/2,  the  interval  lg  *  (X^,X 

interval  for  £  at  level  100(l-a)Z,  since 
P 

1-a  (approximately,  but  to  high  accuracy). 


(s) 


)  gives  a  confidence  . 


ir(r,s,n,p)  Ib  then 


3.  The  Multi-Sample  Method. 

In  this  method,  proposed  by  Schafer  (1974),  c  runs  with  m  repeti¬ 
tions  (n « cm)  are  made.  For  each  run  the  sample  percentile  X^,  where 

Z  ■  [mp]  +  1,  estimates  (,  .  Let  X/0.  be  called  X  ,  and  let  the  c 

P  (Z)  p 

values  of  X  be  X  , ,X  X  ;  the  percentile  £  is  estimated  by 

p  pi  p2  pc  r  p 

the  average 


(2) 


and  the  variance 


of 


X 


P 


X . is  estimated  by 
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S2  “  I  (Xpi~Xp)2/ (c-1)  . 

The  multi-sample  confidence  interval  is 

<4)  *.  5  <*p  -  tx-a/2Sc'1/2-  ’'p  +  tl^/2S<='l/2>  > 

this  is  a  confidence  interval  for  with  approximate  confidence 

probability  1-a,  where  t^_ay2  is  t*ie  uPPer  100(l-a/2)  percentile 
of  Student's  t  with  c-1  degrees  of  freedom.  The  number  of  runs  must 
be  chosen  to  give  a  good  estimate  of  the  standard  deviation. 

4.  Bias  in  Point  Estimates. 

The  question  now  arises:  is  it  best  to  make  c  runs  and  then  take 

A 

the  mean  Xp,  as  the  point  estimate  of  or  to  make  one  large 

run  and  take  X^j  38  the  estimate?  In  both  cases  the  point  estimates 
are  biased  in  finite  samples. 

Let  Q(*)  be  the  inverse  function  of  the  parent  population,  and 
let  Q^(»)  be  its  I1*1  derivative.  In  the  previous  report  results 
in  David  (1970)  were  used  to  show  that,  to  terms  of  order  1/n, 

1— p—C 

(5)  *»<„>>  -  Q(p)  +  —ji  Q<1) (P)  ♦  Q<2,(P>  . 

where  »  np  -  [np], 

A 

When  several  samples  are  used,  the  estimate  £  of  (  is 

P  P 

Xp  -  X^  where  f.  is  tmpj  +  1.  Then  to  order  1/m, 
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l*p— c 

<«  *«<«>  -  e«(1)>  -  Q(p> +  -sr Q<1)(p)  ♦  ns!)  »2<p> 

where  Ej  ■  mp  -  [mp]. 

In  most  practical  applications,  for  the  sample  sizes  and  p-values 
one  would  use  in  estimating  percentiles  accurately  from  Monte  Carlo 
samples,  both  and  z^  would  be  zero.  Then  from  (5)  and  (6)  the 

bias  of  the  single  sample  estimate  is  smaller  than  that  of  the  multi¬ 
sample  estimate  by  a  factor  of  size  approximately  c. 

5.  Expected  Lengths  of  Confidence  Intervals. 

It  is  worthwhile  to  compare  the  expected  lengths  of  the  confidence 
intervals  (Cl)  for  £p.  obtained  by  the  two  methods.  Cl  for  Monte 
Carlo  points  are  rarely  given,  although  they  would  be  useful  when  using 
such  points  to  check  other  methods  of  obtaining  points,  for  example,  by 
curve-fitting  to  the  moments.  For  the  single-sample  case  the  Cl  is 
Ig  =  (X^rj,X^sj),  where  r  and  s  are  given  in  (1),  and  this  is  very 
easily  found.  The  expected  length  L_  of  Is  is  E(X,  .  -  X,  .);  we 
have 

Ls  *  Q(ps)  "  Q(pr)  +  °U/<n+2»  » 

-1/2 

where  ps  ■  s/(n+l)  and  pr  ■  r/(n+l).  To  order  n  this  is,  using 

(5) 

(7)  La  -  Q(1>(P)  2z1_a/2{p(l-p)}i/2/ni/2  . 
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For  the  multi-sample  method  the  CX  is  IB  given  In  (4).  For  its 

1/2  2 

•xpsctsd  length  we  need  E(S/c  );  E(S  )  is  Var(X^),  which  for 
the  normal  case  can  be  found  quite  accurately  using  the  method  of  Davis 
and  Stephens  (1978).  However,  for  large  samples,  as  here,  we  can  use 
(David,  1970,  p.  65) 


(8) 


Var{Xa))  - 


P«9 


in 

*4-2 


{Q(1)(P£)}2  + 


0{l/(m+2)z} 


where  p^  -  i/(n+l)  and  q^  ■  1-p^.  Thus,  to  order  1/n,  the  variance 

2  — 
a  /c  of  the  multi-sample  estimate  X  is 

P 


(9) 


Var(Xp)  -  Var(Xa)) 


and  the  expected  length  L  of  the  confidence  interval  I  is,  to  order 

ib  a 

n'1/2 


(10)  Ln  -  2{p(l-p)}1/2  Q(1)(p)  t1-a/2/n1/2  . 

This  is  the  same  as  Lg  but  with  t(j_a/2)  rePl*cinR  z(i-a/2)"  Thus» 

since  c  will  usually  be  quite  small,  Ln  will  be  larger  than  Ls; 

also  the  accuracy  of  I  may  be  poor  since  it  depends  on  the  normal 

a 

distribution  for  the  values  X  , ,X  X  ,  and  this  does  not  hold 

pi  pc 

well  for  extreme  order  statistics.  There  is  also  a  difference  in  the 
nature  of  the  confidence  intervals:  the  interval  I  obtained  by  the 
multi-sample  method  will  have  endpoints  equally  spaced  about  the  estimate 
X^j ,  while  I8  is  designed  so  that  the  endpoints  have  estimated 
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significance  levels  equally  spaced  about  p.  For  many  uses  of  confidence 
intervals  the  second  property  may  veil  be  the  more  desirable. 


6.  Examples. 

6.1.  Point  estimates. 

The  comparison  between  the  single-sample  and  the  multi-sample  method 
can  be  illustrated  very  well  by  the  normal  and  exponential  distributions, 
for  which  expected  values  of  order  statistics  =  E{X^}  are  exten¬ 
sively  tabulated  or  can  be  easily  calculated. 

The  normal  distribution.  Consider  the  estimation  of  £  the 

95th  percentile  of  the  normal  distribution  which  has  a  true  value  of 
1.6449.  Table  1  gives  values  of  for  the  relevant  order  statistics, 

(from  Harter,  1961,  for  n  £  400)  and  the  bias  in  the  estimates.  Table  1 
can  be  used  as  follows.  Suppose  the  choice  of  estimator  is  between 
n  »  400  replicates  in  one  run,  or  c  *  4  runs  of  m  ■  100  replicates. 

For  the  multi-sample  case,  the  estimator  is  with  expected  value 

1.6872;  for  one  sample,  the  estimator  X^gjj  has  expected  value  1.6553. 
The  bias  is  reduced,  as  expected,  by  a  factor  of  about  4.  For  larger 
values  of  n  the  first  terms  in  (5)  give  an  excellent  approximation  to 
m^;  for  example,  for  n  *  400,  k  -  381,  (5)  gives  1.6552  to  compare  with 

the  exact  value  1.6553.  Another  very  good  approximation  is  that  given  by 
Blom  (1958):  m^  v  Q(Zk)  where  -  (k - 0.375)/(n  + 0.25);  for  n  ■  400, 
k  ■  389  Blom's  formula  gives  1.6543.  Equation  (5)  is  marginally  more 
accurate  so  we  use  it  for  values  of  n  greater  than  400  (the  limit  of 
Harter's  (1961)  table). 
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Suppose,  for  a  second  example,  the  choice  is  between  one  run  with 
n  «  10000  replicates  and  c  ■  10  runs  of  m  ■  1000  replicates,  and  suppose 


the  estimate  is  required  for  £  which  has  a  true  value  1.9600. 

Table  1,  in  the  second  part,  shows  that  the  bias  for  the  single-sample 
case  is  0.0007,  approximately  one-tenth  that  of  the  bias  (0.0074)  in 
the  multi-sample  case,  as  expected. 

The  table  also  shows  cases  where  the  multi-sample  bias  can  be  lower 
than  the  single-sample  bias.  For  example,  the  bias  using  m  »  500  when 
estimating  £  is,  from  the  second  part  of  Table  1,  -0.0026;  if 

c  *  2,  the  single-sample  would  have  1000  samples  for  which  the  bias 
is  0.0074.  However,  if  a  confidence  interval  were  required  for  £ 
it  would  be  foolish  to  base  the  variance  estimate  on  only  2  runs;  and  for 
c  ■  4  or  more  the  bias  is  again  smaller  using  only  one  large  sample. 

The  anomaly  arises  only  because  when  m  *  500,  p  *  .975,  the  quantity 
mp  is  not  an  integer,  and  it  happens  then  that  gives  a  bias 

smaller  than  the  bias  in  m^^  for  n  ■  1000;  it  is  also  negative. 

In  practice  the  multi-sample  bias  will  be  smaller  only  in  somewhat  con¬ 
trived  examples  such  as  the  above. 


6.2.  Confidence  intervals. 

(a)  Accuracy  of  the  single-sample  C.I. 

Table  1  also  gives  the  values  of  r  and  s  used  to  calculate  I8, 
the  confidence  interval  given  by  the  single-sample  method,  Ls,  the 
exact  length  of  lg,  and  the  length  as  estimated  from  (7);  it  can  be 
seen  that  (7)  gives  a  slightly  smaller  length  than  the  correct  Ls  in 
smaller  samples. 
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(b)  Accuracy  of  the  multi-sample  C.I. 

In  determining  the  accuracy  of  above  we  do  not  use  the  approxi¬ 

mation  (8)  for  the  variance  of  an  order  statistic.  However,  the  approxi¬ 
mation  (10)  for  L&  makes  use  of  (8),  and  this  is  an  approximation  which 
cannot  easily  be  checked  accurately,  since  exact  tables  of  the  variance  of 
X^  do  not  exist  for  large  samples;  (9)  above  gives  an  approximation  to 
order  1/n,  and  Davis  and  Stephens  (1978)  give  a  good  approximation. 

However,  we  have  taken  Monte  Carlo  samples  to  examine  confidence  intervals 
further,  and  to  see  the  variations  of  estimates  in  practice. 

A  test  will  consist  of  c  runs  of  m,  to  give  the  multi-sample 

results,  and  one  run  of  n  -  cm  obtained  by  combining  the  c  runs  into 

one.  In  the  study  to  be  reported,  c  was  taken  as  10.  For  each  test, 

the  sample  mean  and  the  standard  deviation  S  (equations  (2)  and  (3) 

above)  of  the  10  estimates  of  C  55*  based  on  m  values,  were  calculated, 

as  they  would  be  in  the  multi-sample  method,  and  the  95Z  confidence  interval 

I  found  from  (4).  In  the  top  half  of  Table  2  these  values  are  given  for 
n 

two  values  of  m,  100  and  500.  The  bias  in  the  estimate  and  the  length  of 
the  confidence  interval  are  also  given. 

In  the  bottom  half  of  the  table  are  recorded  the  results  for  the  single¬ 
sample  method.  Since  6  tests  have  been  made,  the  standard  deviation  of  Xp 
for  this  method  can  be  estimated  in  the  usual  way.  It  can  be  seen  from  the 
table  that  the  Monte  Carlo  estimates  of  bias  are  around  the  values  given  in 
Table  1;  also  the  sample  variances  of  the  estimates  agree,  to  within  sampling 
fluctuations,  with  values  calculated  from  equation  (9).  The  lengths  of  the 
confidence  Intervals  are  roughly  the  same  by  each  method,  as  predicted  in 
Section  5;  S  is  much  1-  ger,  say,  for  a  sample  of  m  »  100  than  it  is  for 
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a  sample  of  n  =  1000,  but  in  calculating  the  confidence  interval  it  must 

be  divided  by  /10,  and  this  brings  the  length  of  the  intervals  for  the 

multi-sample  method  (top  half  of  Table  2)  to  roughly  the  same  size  as  those 

for  the  single-sample  method  (bottom  half  of  Table  2).  Note  that  there 

appears  to  be  some  correlation  between  the  length  of  I  and  that  of  1  . 

m  s 

Results  for  p  *  0.975  are  of  the  same  pattern,  although  they  are 
not  given  here.  So  also  are  the  results  of  similar  tests  for  n  *  2000,  and 
10000. 


6.3.  The  exponential  distribution. 

Table  3  gives  results  for  the  exponential  distribution  corresponding 
to  those  in  Table  1  for  the  normal  distribution.  Again  the  bias  for  a 
multisample  estimate  must  be  given  by  entering  the  table  at  the  value 
of  m,  and  that  for  the  single-sample  estimate  is  given  by  entering  at 
n.  Thus  for  an  estimate  of  (  with  m  *  500  and  c  *  10,  say,  the 

bias  is  .0212,  but  for  n  =  cm  =  5000  the  bias  in  .0021,  almost  exactly 
one-tenth.  The  exact  expected  length  of  a  single-sample  95Z  confidence 
interval  is  given  in  column  9;  the  value  of  this  length  approximated  by 
(7)  is  in  the  last  column.  The  approximation  given  by  (7)  is  quite 
accurate  even  for  sample  sizes  which  would  be  considered  small  in  Monte 
Carlo  sampling  (for  example,  500). 

6. A.  The  uniform  distribution. 

For  estimates  of  £  for  the  uniform  distribution  on  [0,1], 

P 

comparisons  between  the  two  methods  are  straightforward.  For  simplicity 
consider  the  case  where  mp  is  an  integer.  Then  for  the  single-sample 
method,  k  -  np+1  and  for  the  multi-sample  method  £  •  mp+1. 
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Then  E(X(k))  =  (np+l)/(n+l)  and  E(Xp>  =  E(X((l))  =  E(X(jl))  -  (mp+l)/(m+l) . 

The  bias,  for  the  single  sample  estimate  is  (np+1) (n+l)-p  -  (l-p)/(n+l); 
for  the  multisample  estimate  it  is  (l-p)/(nri-l) .  The  bias  in  the 
estimates  is  always  positive.  The  variance  of  X^j  is 
V  =  {m(mp+l)(l-p)}/i(m+l)2(m+2)},  which  is,  to  order  1/m,  V«p(l-p)/m. 

Thus  the  variance  of  X/n.  =  X  ,  which  is  V/c,  is  p(l-p)/n,  to  order 
1/n.  This  is  also,  to  the  same  order,  the  variance  of  X^  in  one  large  run 
of  size  n.  Thus  the  C.I.  for  the  percentile  £  will  be  approximately 
the  same  length  when  obtained  by  either  the  multi-sample  method  or  the 
single-sample  method,  but  the  bias  in  the  estimate  is  reduced,  as  in 
earlier  examples,  by  a  factor  of  approximately  c  using  the  single¬ 
sample  method. 


Further  remarks.  Krutchkoff  (1980)  and  Reynolds  (1980)  have  discussed 
the  parallel  question  of  estimating  the  tail  probability  of  a  distribution 
corresponding  to  a  particular  value  of  x.  These  authors  have  also 
suggested  that  it  is  better  to  use  a  single  sample,  rather  than  several 
runs  of  a  smaller  size,  to  estimate  this  probability. 
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